Search CORE

83 research outputs found

Efficient Randomized Algorithms for the Fixed-Precision Low-Rank Matrix Approximation

Author: Gu Yu
Li Yaohang
Yu Wenjian
Publication venue
Publication date: 01/01/2018
Field of study

Randomized algorithms for low-rank matrix approximation are investigated, with the emphasis on the fixed-precision problem and computational efficiency for handling large matrices. The algorithms are based on the so-called QB factorization, where Q is an orthonormal matrix. Firstly, a mechanism for calculating the approximation error in Frobenius norm is proposed, which enables efficient adaptive rank determination for large and/or sparse matrix. It can be combined with any QB-form factorization algorithm in which B's rows are incrementally generated. Based on the blocked randQB algorithm by P.-G. Martinsson and S. Voronin, this results in an algorithm called randQB EI. Then, we further revise the algorithm to obtain a pass-efficient algorithm, randQB FP, which is mathematically equivalent to the existing randQB algorithms and also suitable for the fixed-precision problem. Especially, randQB FP can serve as a single-pass algorithm for calculating leading singular values, under certain condition. With large and/or sparse test matrices, we have empirically validated the merits of the proposed techniques, which exhibit remarkable speedup and memory saving over the blocked randQB algorithm. We have also demonstrated that the single-pass algorithm derived by randQB FP is much more accurate than an existing single-pass algorithm. And with data from a scenic image and an information retrieval application, we have shown the advantages of the proposed algorithms over the adaptive range finder algorithm for solving the fixed-precision problem.Comment: 21 pages, 10 figure

arXiv.org e-Print Archive

Old Dominion University

MOMCMC: An Efficient Monte Carlo Method for Multi-Objective Sampling Over Real Parameter Space

Author: Li Yaohang
Publication venue: ODU Digital Commons
Publication date: 01/01/2012
Field of study

In this paper, we present a new population-based Monte Carlo method, so-called MOMCMC (Multi-Objective Markov Chain Monte Carlo). for sampling in the presence of multiple objective functions in real parameter space. The MOMCMC method is designed to address the multi-objective sampling problem, which is not only of interest in exploring diversified solutions at the Pareto optimal front in the function space of multiple objective functions, but also those near the front. MOMCMC integrates Differential Evolution (DE) style crossover into Markov Chain Monte Carlo (MCMC) to adaptively propose new solutions from the current population. The significance of dominance is taken into consideration in MOMCMC\u27s fitness assignment scheme while balancing the solution\u27s optimality and diversity. Moreover, the acceptance rate in MOMCMC is used to control the sampling bandwidth of the solutions near the Pareto optimal front. As a result, the computational results of MOMCMC with the high-dimensional ZDT benchmark functions demonstrate its efficiency in obtaining solution samples at or near the Pareto optimal front. Compared to MOSCEM (Multiobjective Shuffled Complex Evolution Metropolis), an existing Monte Carlo sampling method for multi-objective optimization, MOMCMC exhibits significantly faster convergence to the Pareto optimal front. Furthermore, with small population size, MOMCMC also shows effectiveness in sampling complicated multiobjective function space

Old Dominion University

Dinosolve: A Protein Disulfide Bonding Prediction Server Using Context-Based Features to Enhance Prediction Accuracy

Author: Li Yaohang
Yaseen Ashraf
Publication venue: ODU Digital Commons
Publication date: 01/01/2013
Field of study

Background: Disulfide bonds play an important role in protein folding and structure stability. Accurately predicting disulfide bonds from protein sequences is important for modeling the structural and functional characteristics of many proteins. Methods: In this work, we introduce an approach of enhancing disulfide bonding prediction accuracy by taking advantage of context-based features. We firstly derive the first-order and second-order mean-force potentials according to the amino acid environment around the cysteine residues from large number of cysteine samples. The mean-force potentials are integrated as context-based scores to estimate the favorability of a cysteine residue in disulfide bonding state as well as a cysteine pair in disulfide bond connectivity. These context-based scores are then incorporated as features together with other sequence and evolutionary information to train neural networks for disulfide bonding state prediction and connectivity prediction. Results: The 10-fold cross validated accuracy is 90.8% at residue-level and 85.6% at protein-level in classifying an individual cysteine residue as bonded or free, which is around 2% accuracy improvement. The average accuracy for disulfide bonding connectivity prediction is also improved, which yields overall sensitivity of 73.42% and specificity of 91.61%. Conclusions: Our computational results have shown that the context-based scores are effective features to enhance the prediction accuracies of both disulfide bonding state prediction and connectivity prediction. Our disulfide prediction algorithm is implemented on a web server named Dinosolve available at: http://hpcr.cs.odu.edu/dinosolve

Springer - Publisher Connector

PubMed Central

Old Dominion University

Template-Based C8-Scorpion: A Protein 8 State Secondary Structure Prediction Method Using Structural Information and Context-Based Features

Author: Li Yaohang
Yaseen Ashraf
Publication venue: ODU Digital Commons
Publication date: 01/01/2014
Field of study

Background: Secondary structures prediction of proteins is important to many protein structure modeling applications. Correct prediction of secondary structures can significantly reduce the degrees of freedom in protein tertiary structure modeling and therefore reduces the difficulty of obtaining high resolution 3D models. Methods: In this work, we investigate a template-based approach to enhance 8-state secondary structure prediction accuracy. We construct structural templates from known protein structures with certain sequence similarity. The structural templates are then incorporated as features with sequence and evolutionary information to train two-stage neural networks. In case of structural templates absence, heuristic structural information is incorporated instead. Results: After applying the template-based 8-state secondary structure prediction method, the 7-fold cross-validated Q8 accuracy is 78.85%. Even templates from structures with only 20% ~ 30% sequence similarity can help improve the 8-state prediction accuracy. More importantly, when good templates are available, the prediction accuracy of less frequent secondary structures, such as 3-10 helices, turns, and bends, are highly improved, which are useful for practical applications. Conclusions: Our computational results show that the templates containing structural information are effective features to enhance 8-state secondary structure predictions. Our prediction algorithm is implemented on a web server named C8-SCORPION available at: http://hpcr.cs.odu.edu/c8scorpion

Springer - Publisher Connector

PubMed Central

Old Dominion University

Convergence Analysis of Markov Chain Monte Carlo Linear Solvers Using Ulam--von Neumann Algorithm

Author: Ji Hao
Li Yaohang
Mascagni Michael
Publication venue: ODU Digital Commons
Publication date: 01/01/2013
Field of study

The convergence of Markov chain--based Monte Carlo linear solvers using the Ulam--von Neumann algorithm for a linear system of the form x = Hx + b is investigated in this paper. We analyze the convergence of the Monte Carlo solver based on the original Ulam--von Neumann algorithm under the conditions that ||H|| \u3c 1 as well as ρ(H) \u3c 1, where ρ(H) is the spectral radius of H. We find that although the Monte Carlo solver is based on sampling the Neumann series, the convergence of Neumann series is not a sufficient condition for the convergence of the Monte Carlo solver. Actually, properties of H are not the only factors determining the convergence of the Monte Carlo solver; the underlying transition probability matrix plays an important role. An improper selection of the transition matrix may result in divergence even though the condition ||H|| \u3c1 holds. However, if the condition ||H|| \u3c 1 is satisfied, we show that there always exist certain transition matrices that guarantee convergence of the Monte Carlo solver. On the other hand, if ρ(H) \u3c1 but ||H|| ≥ 1, the Monte Carlo linear solver may or may not converge. In particular, if the row sum ∑ n/j= 1|Hij \u3e 1 for every row in H or, more generally, ρ(H+) \u3e1, where H+ is the nonnegative matrix where H+ij = |Hij|, we show that transition matrices leading to convergence of the Monte Carlo solver do not exist. Finally, given H and a transition matrix P, denoting the matrix H* via H*ij = H2ij/Pij, we find that ρ(H*) \u3c 1 is a necessary and sufficient condition for convergence of the Markov chain--based Monte Carlo linear solvers using the Ulam--von Neumann algorithm

Old Dominion University

DeepFrag-k: A Fragment-Based Deep Learning Approach for Protein Fold Recognition

Author: Elhefnawy Wessam
Li Min
Li Yaohang
Wang Jianxin
Publication venue: ODU Digital Commons
Publication date: 01/11/2020
Field of study

Background: One of the most essential problems in structural bioinformatics is protein fold recognition. In this paper, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features at fragment level to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the first stage employs a multi-modal Deep Belief Network (DBN) to predict the potential structural fragments given a sequence, represented as a fragment vector, and then the second stage uses a deep convolutional neural network (CNN) to classify the fragment vector into the corresponding fold. Results: Our results show that DeepFrag-k yields 92.98% accuracy in predicting the top-100 most popular fragments, which can be used to generate discriminative fragment feature vectors to improve protein fold recognition. Conclusions: There is a set of fragments that can serve as structural “keywords” distinguishing between major protein folds. The deep learning architecture in DeepFrag-k is able to accurately identify these fragments as structure features to improve protein fold recognition

Old Dominion University